133 research outputs found

    Multivariate Analysis of Tumour Gene Expression Profiles Applying Regularisation and Bayesian Variable Selection Techniques

    No full text
    High-throughput microarray technology is here to stay, e.g. in oncology for tumour classification and gene expression profiling to predict cancer pathology and clinical outcome. The global objective of this thesis is to investigate multivariate methods that are suitable for this task. After introducing the problem and the biological background, an overview of multivariate regularisation methods is given in Chapter 3 and the binary classification problem is outlined (Chapter 4). The focus of applications presented in Chapters 5 to 7 is on sparse binary classifiers that are both parsimonious and interpretable. Particular emphasis is on sparse penalised likelihood and Bayesian variable selection models, all in the context of logistic regression. The thesis concludes with a final discussion chapter. The variable selection problem is particularly challenging here, since the number of variables is much larger than the sample size, which results in an ill-conditioned problem with many equally good solutions. Thus, one open problem is the stability of gene expression profiles. In a resampling study, various characteristics including stability are compared between a variety of classifiers applied to five gene expression data sets and validated on two independent data sets. Bayesian variable selection provides an alternative to resampling for estimating the uncertainty in the selection of genes. MCMC methods are used for model space exploration, but because of the high dimensionality standard algorithms are computationally expensive and/or result in poor Markov chain mixing. A novel MCMC algorithm is presented that uses the dependence structure between input variables for finding blocks of variables to be updated together. This drastically improves mixing while keeping the computational burden acceptable. Several algorithms are compared in a simulation study. In an ovarian cancer application in Chapter 7, the best-performing MCMC algorithms are combined with parallel tempering and compared with an alternative method

    Structured penalized regression for drug sensitivity prediction

    Full text link
    Large-scale {\it in vitro} drug sensitivity screens are an important tool in personalized oncology to predict the effectiveness of potential cancer drugs. The prediction of the sensitivity of cancer cell lines to a panel of drugs is a multivariate regression problem with high-dimensional heterogeneous multi-omics data as input data and with potentially strong correlations between the outcome variables which represent the sensitivity to the different drugs. We propose a joint penalized regression approach with structured penalty terms which allow us to utilize the correlation structure between drugs with group-lasso-type penalties and at the same time address the heterogeneity between omics data sources by introducing data-source-specific penalty factors to penalize different data sources differently. By combining integrative penalty factors (IPF) with tree-guided group lasso, we create the IPF-tree-lasso method. We present a unified framework to transform more general IPF-type methods to the original penalized method. Because the structured penalty terms have multiple parameters, we demonstrate how the interval-search Efficient Parameter Selection via Global Optimization (EPSGO) algorithm can be used to optimize multiple penalty parameters efficiently. Simulation studies show that IPF-tree-lasso can improve the prediction performance compared to other lasso-type methods, in particular for heterogenous data sources. Finally, we employ the new methods to analyse data from the Genomics of Drug Sensitivity in Cancer project.Comment: Zhao Z, Zucknick M (2020). Structured penalized regression for drug sensitivity prediction. Journal of the Royal Statistical Society, Series C. 19 pages, 6 figures and 2 table

    Outliers and Influence Points in German Business Cycles

    Get PDF
    In this paper, we examine the German business cycle (from 1955 to 1994) in order to identify univariate and multivariate outliers as well as influence points corresponding to Linear Discriminant Analysis. The locations of the corresponding observations are compared and economically interpreted

    Combining heterogeneous subgroups with graph-structured variable selection priors for Cox regression

    Full text link
    Important objectives in cancer research are the prediction of a patient's risk based on molecular measurements such as gene expression data and the identification of new prognostic biomarkers (e.g. genes). In clinical practice, this is often challenging because patient cohorts are typically small and can be heterogeneous. In classical subgroup analysis, a separate prediction model is fitted using only the data of one specific cohort. However, this can lead to a loss of power when the sample size is small. Simple pooling of all cohorts, on the other hand, can lead to biased results, especially when the cohorts are heterogeneous. For this situation, we propose a new Bayesian approach suitable for continuous molecular measurements and survival outcome that identifies the important predictors and provides a separate risk prediction model for each cohort. It allows sharing information between cohorts to increase power by assuming a graph linking predictors within and across different cohorts. The graph helps to identify pathways of functionally related genes and genes that are simultaneously prognostic in different cohorts. Results demonstrate that our proposed approach is superior to the standard approaches in terms of prediction performance and increased power in variable selection when the sample size is small.Comment: under review, 19 pages, 10 figure

    Structured Bayesian variable selection for multiple correlated response variables and high-dimensional predictors

    Full text link
    It is becoming increasingly common to study complex associations between multiple phenotypes and high-dimensional genomic features in biomedicine. However, it requires flexible and efficient joint statistical models if there are correlations between multiple response variables and between high-dimensional predictors. We propose a structured multivariate Bayesian variable selection model to identify sparse predictors associated with multiple correlated response variables. The approach makes use of known structure information between the multiple response variables and high-dimensional predictors via a Markov random field (MRF) prior for the latent indicator variables of the coefficient matrix of a sparse seemingly unrelated regressions (SSUR). The structure information included in the MRF prior can improve the model performance (i.e., variable selection and response prediction) compared to other common priors. In addition, we employ random effects to capture heterogeneity of grouped samples. The proposed approach is validated by simulation studies and applied to a pharmacogenomic study which includes pharmacological profiling and multi-omics data (i.e., gene expression, copy number variation and mutation) from in vitro anti-cancer drug sensitivity screening

    Tissue-specific identification of multi-omics features for pan-cancer drug response prediction

    Get PDF
    Publisher Copyright: © 2022 The Author(s)Current statistical models for drug response prediction and biomarker identification fall short in leveraging the shared and unique information from various cancer tissues and multi-omics profiles. We developed mix-lasso model that introduces an additional sample group penalty term to capture tissue-specific effects of features on pan-cancer response prediction. The mix-lasso model takes into account both the similarity between drug responses (i.e., multi-task learning), and the heterogeneity between multi-omics data (multi-modal learning). When applied to large-scale pharmacogenomics dataset from Cancer Therapeutics Response Portal, mix-lasso enabled accurate drug response predictions and identification of tissue-specific predictive features in the presence of various degrees of missing data, drug-drug correlations, and high-dimensional and correlated genomic and molecular features that often hinder the use of statistical approaches in drug response modeling. Compared to tree lasso model, mix-lasso identified a smaller number of tissue-specific features, hence making the model more interpretable and stable for drug discovery applications.Peer reviewe

    Dissecting the Prognostic Significance and Functional Role of Progranulin in Chronic Lymphocytic Leukemia

    Get PDF
    Chronic lymphocytic leukemia (CLL) is known for its strong dependency on the tumor microenvironment. We found progranulin (GRN), a protein that has been linked to inflammation and cancer, to be upregulated in the serum of CLL patients compared to healthy controls, and increased GRN levels to be associated with an increased hazard for disease progression and death. This raised the question of whether GRN is a functional driver of CLL. We observed that recombinant GRN did not directly affect viability, activation, or proliferation of primary CLL cells in vitro. However, GRN secretion was induced in co-cultures of CLL cells with stromal cells that enhanced CLL cell survival. Gene expression profiling and protein analyses revealed that primary mesenchymal stromal cells (MSCs) in co-culture with CLL cells acquire a cancer-associated fibroblast-like phenotype. Despite its upregulation in the co-cultures, GRN treatment of MSCs did not mimic this effect. To test the relevance of GRN for CLL in vivo, we made use of the Eμ-TCL1 CLL mouse model. As we detected strong GRN expression in myeloid cells, we performed adoptive transfer of Eμ-TCL1 leukemia cells to bone marrow chimeric Grn−/− mice that lack GRN in hematopoietic cells. Thereby, we observed that CLL-like disease developed comparable in Grn−/− chimeras and respective control mice. In conclusion, serum GRN is found to be strongly upregulated in CLL, which indicates potential use as a prognostic marker, but there is no evidence that elevated GRN functionally drives the disease
    • …
    corecore